Improving performance of fuzzy string matching against a dictionary [closed]

Posted by Nathan Harmston on Programmers See other posts from Programmers or by Nathan Harmston
Published on 2011-02-09T13:47:55Z Indexed on 2011/02/09 15:33 UTC
Read the original article Hit count: 484

Filed under:
|
|

Hi,

So I'm currently working for with using SecondString for fuzzy string matching, where I have a large dictionary to compare to (with each entry in the dictionary has an associated non-unique identifier). I am currently using a hashMap to store this dictionary.

When I want to do fuzzy string matching, I first check to see if the string is in the hashMap and then I iterate through all of the other potential keys, calculating the string similarity and storing the k,v pair/s with the highest similarity. Depending on which dictionary I am using this can take a long time ( 12330 - 1800035 entries ). Is there any way to speed this up or make it faster? I am currently writing a memoization function/table as a way of speeding this up, but can anyone else think of a better way to improve the speed of this? Maybe a different structure or something else I'm missing.

Many thanks in advance,

Nathan

© Programmers or respective owner

Related posts about java

Related posts about data-structures